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ABSTRACT 

A need exists for a standardized battery of human perform- 
ance tests in order to measure the effects of various treatments. 
The present paper reports on progress in such a program, funded 
jointly by NASA and Navy. Three batteries are available which 
differ in length (7.5; 15; 30 minutes), and the number of tests 
in the battery (3; 10; 15). All tests are implemented on a 

portable, lap-held, briefcase-size microprocessor (NEC PC 8201A). 
Performances measured include information processing, memory, 
visual perception, reasoning, motor skills, etc. Current 
programs are underway to determine norms, reliabilities, 
stabilities, factor structure of tests, comparisons with marker 
tests, apparatus suitability, etc. Rationale for the battery is 
provided . 


INTRODUCTION 

We originally set out to standardize a battery of human 
performance tests in response to a Navy requirement to study the 
effects of ship motion on humans. The focus of that program 
centeied on repeated measures because nearly all studies of the 
effects to humans of exotic environments follow such a paradigm. 
Because of this, two statistical properties of tests received 
more attention in our program than in those reported by others: 
The two properties we studied were stability and reliability. 
Validity and factor structure, often examined first by others, 
have been left until later in the program. We continue to argue 
that this is the correct emphasis because without the first two 
properties, the second two cannot be meaningfully determined. 

The results of that program, called PETER (Performance Eval- 
uation Tests for Environmental Research), were reported in a 
series of 90 publications (cf., Harbeson, Bittner, Kennedy, 
Carter, & Krause, 1983, for a complete list). A recent review 
reported on 114 tests and considered 30 suitable for 
incorporation into a battery (Bittner, Carter, Kennedy, Harbeson 
& Krause, 1984). The criteria considered important for such a 
battery are listed in Table J. The results of the good tests 
appear in Table II. 

Everyone ordinarily concurs that stability and reliability 
are important issues in testing, but it is not always evident to 
what extent. What follows is our rationale for selecting these 
two as our focus. 
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Table 1. Definition of Task Features 


FEATURE 

DEFINITION 

NAME 

Name of the task or measure as used in the literature. 

FACTOR 

The factor(s) assessed by the measure as Identified in 
the literature or by judgments of the authors. 

DOMAIN 

Characterization of the domain(s) of assessment of the 
capability as cognitive, perceptual (including sensory), 
or motor. 

ADMINISTRATION TIME 

The typical testing time for a measure; this includes 
all testing time required to obtain a score, (e.g., 
components of a derived score) 

TYPE OF ADMIN. 

Identification of task as individually or group 
admini stered. 

TOTAL STABILIZATION 
TIME IN MINUTES 
(DIFFERENTIAL) 

The total stabilization time is the amount of elapsed 
experimental time (whether massed or distributed) 
required for mean, variance, and differential 
(correlational) stabilization. (The amount of elapsed 
practice time required for Differential Stabilization 
alone is in parentheses). 

RELIABILITY 
EFFICIENCY 
(3 minutes) 

The differentially stabilized reliability normalized to 
a 3 minute administration. Normalization to 3 minutes 
was by the Spearman-Brown Equation (Bittner & Carter, 
1981; Winer, 1971). 

REFERENCES 

Cited in order are the relevant stability study, the 
original source of the measure, and occasionally other 
significant references. 
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Reliability: If performances between subjects differ on 
tests, those differences may be due to unforeseen, uncontrolled 
and perhaps unrelated issues, in which case the between subject 
differences are considered to be error. Alternatively, there may 
be differences in capability, in which case they are considered 
true, and if these differences can be measured, they can improve 
the precision of the statistic which is employed in studying the 
potential effects of treatments. 

For example, the equation below is one of the well known 
variants on students' t (Winer, 1971) for measuring the differ- 
ences in means (X) over two independent groups: 



Moreover, for the special case where N is equal in the two 
administrations, this equation is sometimes written: 



And when, in addition to equal N, the variances are equal 
over the two occasions or administrations, the equation may be 
simplified to: 

t - Jl-^2- 



(3) 


The question these statistics (Equations 1, 2 and 3) permit 
one to answer is whether the obtained difference between two 
means X i and X 2 (say one group "with" and one "without" the drug) 
is likely to have occurred by chance. The way we decide is by 
forming a ratio of the DIFFERENCES (numerator) to the ERROR TERM 
(denominator). If the difference is many times the error, we 
infer the difference is not likely to be chance. If the ratio is 
small, then the converse. When the cost of being wrong is high, 
we take steps to improve our precision by increasing sample size 
or we select measures of behavior which exhibit small between 
subject differences because both of these serve to reduce the 
denominator. Also, practice usually will reduce between subject 
differences (variances) too. However, in most cases of human 
performance measurement, a great deal of the differences between 
subjects are not ERROR and they are large. People differ along 
behavioral dimensions. 
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» Although the size of the sample would also serve to reduce 

the size of the error term, this option is not always available. 
In studies of environmental stress and drugs, indeed, it is often 
impossible, and probably unethical, to expose large groups to the 
treatments. In these cases, for economy and precision, we usually 
follow a repeated measures design and each subject serves as his 
own control. In such a case, the t statistic uses the equation 
below. 

t - — 1— — ^2 — 

( 4 ) 


Note that much of this equation is the same as before. The one 
addition is the covariance term and this indicates that you may 
reduce the error term proportional to how well they are corre- 
lated over the two exposures. However, it is not obvious tc what 
extent the error term may be reduced but two examples will suf- 
f ice . 

Again, if we assume that the variances are equal (NOT 
NECESSARILY SMALL!), then the equation simplifies to: 

t = —1—2— ^2 — 



Then, if r ^2 = 0.00, the equation returns to the t test (cf.. 
Equation 3 above) that we used for examining the differences in 
two d i f f erent groups. 

t - -1^2- 
2 

2SD Z 
N 

Alternatively, if the retest reliability for r 12 
equation simplifies to: 

t = -*1^*2- 


(7) 


And in this case, the ERROR TERM approaches zero and thus the 
obtained differences will be true and significant when they 
occur. This effort, we believe, provides the best opportunity for 
obtaining sensitive nonintrusive measures of human performance 
that we know of. 




( 6 ) 

= 1.00, then the 
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Stability: We consider that tests must exhibit stability of 
means, standard deviations or variances and of correlation (cf., 
Bittner & Carter, 1981; for a review). To be considered stable, 
means over sessions should be level or asymptotic and, provided 
that other criteria are met, may also show a regular and predict- 
able trend. Standard deviations should be constant or they may 
increase proportional to mean increase. Correlations over ses- 
sions, to be considered stable, should be constant with no change 
due to increasing separation of trials. It would not do to take 
important steps to obtain high reliabilities and then have them 
change over trials or sessions. When the latter change occurs, 
it is considered to be an example of "superdiagonal form" 
(Humphreys, 1960, Jones, 1970, 1972, 1980) and the task is rated 
unstable. The consequences of such an occurrence are that in the 
extreme case (retest correlations decrease to zero over trials - 
Kennedy, Bittner & Harbeson, 1981) the capacity or ability which 
is being measured disappears and a new one takes its place. In 
the less extreme case the factor structure of the test shifts to 
some extent. So far as we can tell, no other attempt at test 
battery standardization set stability of the correlations as a 
requirement. If correlations change by becoming lower: a) the 
task will be insensitive to change and b) even if it were to 
change, you wouldn't know what it tested. 

For more information about methods for stability analysis, 
see Bittner, 1979; and for sophisticated treatments see Jones, 
Kennedy & Bittner (1981) and Steiger (1980) . 

Several items emerged from the PETER program in addition to 
the 30 so-called "golden hits" (Table II). We discovered or 
rediscovered outcomes that others had reported elsewhere, 
although not widely. 

A. Difference scores 

Difference scores have been reported to have poorer relia- 
bility than the primary scores from which they are derived 
(Cronbach & Furby, 1970). Carter and Krause (1983) demonstrated 
algebraically that slope scores are a form of difference scores, 
as are percents, ratios and other derived scores. They (Carter & 
Krause, 1983) then went on to show empirically that slope scores 
in several experiments within the PETER program possess very low 
reliabilities, if they are present at all. Bittner et al (1984) 
reported that derived scores fared significantly poorer (P<.01) 
than other types of scores in the 100+ tests evaluated. Many of 
the information processing tasks so popular these days employ a 
slope score as an index of performance. Some of these are advo- 
cated as potential indicants of individual traits or capabilities 
of individuals, and it is implied they may be useful in selec- 
tion. This advocacy is probably ill-advised. Tests which have 
been indicted because thpy contain such scores include Stroop 
tests, Steinberg's tests, Neisser's tests, reaction time (e.g., 
Hick's Law) and others. Slope scores do show group differences. 
For example, the color-word condition on the Stroop test (Harbe- 
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TABLE 2: GOOD 



ARITHMETIC: 

VERTICAL 

ADDITION 


AIMING: FINE EYE- 
HAND COORDINATION 
(FLEISHMAN 6 
ELLISON, 1962) 

NUMBER FACILITY 
(N) (EKSTROM ET 
AL., 1976) 


ASSOCIATIVE ASSOCIATIVE 
MEMORY: NUMBER MEMORY (*<A) 

CORR: LIST 1 (EKSTROM ET AL., 


G 30(30) 0.8 


G 48(8) 


G 20(20) 


KRAUSE & WOLDSTAD (1983); 
FLEISHMAN & ELLISON (1962) 


BITTNER, CARTER, KRAUSE, 
KENNEDY, A HARBESON (1983); 
CARTER A SB ISA (1982) 

CARTER A KRAUSE (1982); 
UNDERWOOD ET AL. (1977); 
KRAUSE A KENNEDY, 1980 


PURSUIT TRACKING 
(KENNEDY, BITTNER 
6 JONES, 1911) 

UNKNOWN 


ATARI* 

AIR COMBAT 
MANEUVERING 

ATARI* 

ANTIAIRCRAFT 

CHOICE 

REACTION TIME (DONDERS, 

TIME: 1-CHOICE 1868) 

CHOICE CHOICE REACTION 

REACTION TIME: TIME (DONDERS, 

4-CHOICE 1868) 

CODE MEMORY ASSOC. (M/ 

SUBSTITUTION PERCEPTUAL SPEEC 


SIMPLE REACTION 
TIME (DONDERS, 


FLEXIBILITY 
OF CLOSURE 


GRAMMATICAL 

REASONING 


MEMORY ASSOC. (MA) 
PERCEPTUAL SPEED 
(P)( EKSTROM ET 
AL., 1976) 

CLOSURE, 
FLEXIBILITY OF 
(CF) (EKSTROM 
ET AL., 1976) 

REASONING, 
LOGICAL (RL) 
(EKSTROM ET 
AL., 1976) 


GRAPHEMIC AND REAOING SPEED 
PHONEMIC ANAL- (BARON A 
YSIS: SENSE/ MCKILLOP, 1975) 

NONSENSE 

LETTER CLASS- RETRIEVAL FROM 


IF I CAT I ON : 
NAME 


LTM A HATCHING 
(POSNER 6 
MITCHELL, 1973) 


LETTER CLASS- RETRIEVAL FROM 
IF I CAT I ON : LTM & MATCHING 
CATEGORY [POSNER & 

MITCHELL. 1973 


.ompletc reference citations are contalne 


I 135(135) 


I 126(126) 


I 35(35) 


I 50(50) 


G 16(16) 


G 9(9) 


G 18(18) 


G 16(16) 


G 84(84) 


G 121(12)) 


JONES, KENNEOY, A BITTNER 
(1981); KENNEDY, BITTNER, 
HARBESON, & JONES (1982) 

JONES A KENNEDY (IN PRESS) 
WITH ADAPTATIONS 

KRAUSE A BITTNER (1982); 
TEICHNER A KREBS (1974) 


KRAUSE A BITTNER (1982); 
TEICHNER A KREBS (1974) 


PEPPER, KENNEDY, BITTNER, 
A WIKER (1980); WECHSLER 
(1981) 


BITTNER, ET AL. (1983); 
MORAN A JCFFORD (1959) 


BITTNER, ET AL. (1983); 
CARTER, KENNEDY, A BITTNER 
(1981); BADDELEY (1968) 


HARBESON, KENNEDY, KRAUSE, 
A BITTNER (1982A); BARON A 
MCKILLOP (1973); ROSE I 
FERNANrES (1977) 

HARBESON, ET AL. (1982A); 
POSNER 4 MITCHELL (1973); 
ROSE 6 FERNANDES (1977) 


HARBESON, ET AL. (1982A); 
POSNER . MITCHELL (1973); 
ROSE 6 FERNANDES (1977) 
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TABLE 2: GOOD (CONTINUED) 


+# 




LI 


IT 

IP 

nr" 


NAME 

FACTOR 

jfl 

ADMIN 1 

m\ 


F 

REFERENCES 




TIME 1 

M 

TIME IN l 

F M 





(MIN) 

SJ 

MINUTES 

I I 




] 


HI 

(DIFF) 

k C N 
I 




p 

7 

i 


LKLHI 

CARTER & WOLD St AD (1982); 

LOG. LATENCY 

TRANSFORMATION 






READER, BENEL, ft RAHE 


(EGAN, 1978) 





- 

(1981) 

MINNESOTA 

MANUAL DEXTERITY 

M 

2-4 

i 

10(10) 

0.64 

CARTER, STONE, ft BITTNER 

RAT r OF 

(FLEISHMAN ft 






(1982); SCHOENFELDT (1972) 

MANIPULATION: 

TURNING 

ELLISON, 1962) 







PATTERN 

SPATIAL ABILITY 

P 

2 

G 

18(18) 

0.93 

SHANNON, CARTER, ft BOUDREAU 

COMPARISON: 

(KLEIN ft 






(1983); KLEIN ft ARMITAGE 

NUMBER CORRECT 

ARMITAGE, 1979) 






(1979); CARTER ft SBISA 

MINUS NUMBER 
INCORRECT 







(1982) 

PERCEPTUAL 

PERCEPTUAL SPEED 

P 

2.5 

G 

23(15) 

0.86 

BITTNER. CARTER, KRAUSE ET 

SPEED 

(PS) (EKSTROM ET 






AL . ( 1982 ) ; MORAN ft HEFFORD 


AL., 1976) 






(1959) 

SEARCH FOR 

READING SPEED 

P 

6 

I 

54(54) 

0.65 

SHANNON ET AL. (1983); 

TYPOS IN 
PROSE: MEDIAN 
DETECTION TIME 







CARTER A KRAUSE (1983) 

SPOKE 

spf:d arm move- 

M 

0.67 

G 

1(1) 

0.95 

BITTNER, LUNDY, KENNEDY. 

CONTROL (C) 

MENT (FLEISHMAN 


APPROX 




ft HARBESON (1982) 

TASK 

ft ELUSION, 1962) 







STERNBERG ITEM 

SHORT TERM MEMORY 

C 

3 

I 

18(18) 

0.70 

CARTER, KENNEDY, BITTNER. 

RECOGNITION: 

SCAN (STERNBERG, 






ft KRAUSE (1980); STERNBERG 

POSITIVE SET 1 

1966, 1975) 






(1969, 1975) 

STERNBERG ITEM 

SHORT-TERM MEMORY 

C 

3 

I 

15(9) 

0.80 

CARTER ET AL. (1980); 

RECOGNITION: 

SCAN (STERNBERG, 






CARTER ft KRAUSE (1982); 

POSITIVE SET 4 

1966, 1975) 






STERNBERG (1969, 1975) 

STROOP: COLOR 

MIXED 

c 

0.5 

G 

1.5(1. 5) 

0.97 

HARBESON, KRAUSE, KENNEDY, 
ft BITTNER (1982B) 

WORDS (CM) 


p 





TRACKING: 

TRACKING, 

p 

1 

I 

100(100) 

0.60 

DAMOS, BITTNER, KENNEDY, 

CRITICAL 

CRITICAL (JEX, 

M 





HARBESON, ft KRAUSE (1984); 


MCDONNELL 6 






JEX, MCDONNELL ft PHATAK 


PHATAK, 1966) 






(1966) 

TRACKING: 

TRACKING, 

P 

1 

I 

100(100) 

0.50 

DAMOS, BITTNER, KENNEDY, ft 

DUAL CRITICAL 

CRITICAL ft DUAL 
FACTOR? (DAMOS 
ET AL., 1981) 

M 





HARBESON (1981) 

VISUAL 

CONTRAST SENSI- 

P 

3 

I 

<1(<1) 

0.51 

GINSBURG, BITTNER, KENNEDY 

CONTRAST 

TIVITY FUNCTION: 

P 

3 

I 

<1(<1) 


HARBESON (1983); GINSBURG 

SENSITIVITY: 

1, 2, 4, 8, 16 cpc 

P 

3 

I 

<1(<1) 

0.74 

ft EVANS (1982) 

METHOD OF 

(GINSBURG ft EVANS 

P 

3 

I 

<1(<1) 



INCREASING 

CONTRAST 

1982) 

P 

3 

I 

<1(<1) 

0.53 


WORD FLUENCY 

WORD FLUENCY (FW) 

C 

3 

G 

<1(<1) 


CARTER, CURLEY, ft STYER 


(EKSTROM ET AL., 
1976) 






(IN PRESS) 

"Complete reference citations are conta 










son, Krause, Kennedy & Bittner, 1982) virtually always has a 
greater latency than the black-and-white word or color block 
conditions. The reliabilities of these basic scores are in the 
range of .90, but their differences have reliablities which are 
essentially zero. 

B. Power from replications 

Another methodological finding within the PETER program had 
to do with the tradeoffs between sample size and test-retest 
reliability in the special cases of repeated measures where 
variances are constant. If one uses "Student's t" formula, where 
each subject serves as his own control, great power is obtained 
by having high test-retest correlations. This issue is described 
well in the paper by Carter, Kennedy & Bittner, (1981) where a 
nomogram is available (Figure 1) to permit the tradeoff of sample 
size for reliability of test scores to obtain iso-precision of 
significance. If one is dealing with ability measurement, and 
one is faced with a repeated measures design in an unusual envir- 
onment, it is ordinarily difficult to increase the sample size 
beyond some value and 12 or 15 is not an uncommon upper limit. 
Sharpening the t-test is ordinarily thought to be best effected 
by minimizing between-subj ect variance or increasing sample size. 
A third way is by maximizing the test-retest reliability. The 
latter can be more economical than increasing sample size, and if 
hazard is involved is probably more ethical. A fourth method is 
replication (Dunlap, Bittner, & Jones, 1983). 



Figure 1. Nomogram relating sample size (N), intertrial correla- 
tion (R), and the smallest significant (£ = .05) difference (D). 
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C. Convergence of factor structure 


A third issue (not completely studied but important in our 
judgment) emerged when we evaluated families of tests. A 
"memory" family, a "video-game" family, a "search and target 
acquisition" family, an "information processing" family, and a 
"cognitive" family all were studied. In these studies it 
appeared as though fewer factors were available late in practice 
than earlier. That is, the factor structure resembled what was 
to be expected from the reports which were in the literature only 
during the initial practice on the tasks. Once the tasks reached 
full stability, the number of factors appeared to converge and be 
fewer than earlier in practice. For example, four factorially 
different tests from the Underwood battery (Underwood, Boruch & 
Malmi, 1977) were administered to the same population over three 
weeks (Harbeson, Krause, & Kennedy, 1981). After dropping the 
tests that had no reliability at all and/or that did not 
stabilize a single large factor seems capable of describing the 
performances that result. A not dissimilar finding occurred when 
a series of information processing tasks was studied. Again this 
outcome was obtained after dropping unstable or unreliable task's 
scores. Of those that remained, it was not uncommon for one 
factor to be able to be used to characterize all performance. 
This was never pursued adequately in the PETER program and should 
be followed-up because it is possible that given adequate 
practice to achieve stability on a series of tasks, one may find 
that fewer factors are necessary than during the early stages of 
acquisition. This result could have a profound effect on primary 
and secondary selection, as well as other forms of testing. 

D. Repeated Measures 

We were prepared to find long-term practice effects from 
other work we had done, so we began with the idea that it might 
take many replications to obtain stability. We originally set up 
for 10 session studies, and lengthened that to 15 (i.e., three 
weeks). As work progressed in the PETER Program, four issues 
emerged related to extended practice: 1) improvements persist 
over many end sometimes all sessions; 2) they are often very 
large; 3) they are not limited to tests of SKILL but occur in 
ABILITY tests (e.g., cognitive and information processing) too; 
and 4) the improvement often occurs at different RATES for dif- 
ferent persons. This latter led us to the quest for "differen- 
tial stability." 

E. Di f f ecential Stability 

"Differential stability" emerges from a notion offered by 
Jones in the early 70s. Picking up on ide<- s discussed by 
Humphreys (1960), Jones (1970, 1972) suggested that when practice 
occurs, performance improves, and not always at the same RATE for 
all subjects. Therefore, some people acquire skill rapidly and 
others acquire it less rapidly. Moreover, TERMINAL skill levels 
are not necessarily predictable from subjects' original perform- 
ance (or intercept), nor from the rate at which they acquire 
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terminal level3 of performance. From work in the PETER program 
we now recognize that the two-process theory (Jones, 1970), which 
had been developed to handle data in the area of repeated mea- 
sures of SKILL acquisition, extends to memory, cognition, infor- 
mation processing, and probably all human performances. Thus 
initial scores on ABILITY tests and on SKILL tasks may not be 
perfectly correlated with terminal levels, nor with the rate at 
which the terminal levels are reached. 

More importantly, it follows that the terminal level of 
performance may provide a better index of the true ability 
(potential, capability, capacity, penchant, tendency, proclivity, 
talent) of the individual than performance earlier in practice. 
Therefore, if treatments (environments, chemicals) are intro- 
duced, their effect can be better observed as changes in perform- 
ance from such a baseline. Obviously this approach has impli- 
cations for selection and training research too. A possible 
criticism of the PETER program is that it concentrated all its 
energies on the RELIABILITY (stability and sensitivity) of tests 
and never got around to studying the VALIDITY. To some extent 
this is true, because although all the tests which were studied 
had already demonstrated their validity elsewhere (cf. Carter, 
Kennedy & Bittner, 1980), since adequate attention had not 
previously been paid to stability in other efforts, it is 
problematic whether the previously found validities were indeed 
valid. However, the few validity studies conducted in the 
program showed that a subset of the tests are sensitive to ship 
motion (Wiker, Kennedy, McCauley « Pepper, 1979) vibration 
(Guignard, Bittner & Harbeson, 1983), altitude (Bandaret, 
personal communciation, 1984) and visual kinematics (Kennedy, 
Ricard, Bittner & Frank, 1984). 

Until we began the PETER program, concerted efforts at 
repeated measures studies had not appeared with any regularity in 
the recent literature (Forrester, 1984). Yet it is only with 
such a paradigm that certain critical questions about abilities 
can be answered. In my opinion, our most important contributions 
were the focus on stability and reliability. By stability we 
mean "differential stability", and we called the reliability of 
test scores "task definition." It was an "individual differ- 
ences" approach. 

In previous programs of test battery development, attention 
was paid to stability of means (average scores) and to a lesser 
extent to the stability of standard deviations or variances. We 
added the requirement that the cross session correlations must be 
constant because of Jones' work (1970, 1972, 1980). We know of no 
other battery development effort where such a requirement was 
formally stipulated. This is not different from the need for 
symmetry of the variance covariance matrix, which is recognized 
to be necessary for repeated measures ANOVA (Winer, 1971), al- 
though in my expedience some investigators incorrectly expect 
that control groups or large samples or something else will make 
this problem go away. Therefore, we attempted to show whether a 
test was stable or not by showing that it met minimum require- 


10 


4 


ments for mean, standard deviation and cross-session correla- 
tional stability. Differential stability i3 not just statistical 
frou-frou. Lack of it implies that what is being measured is 
changing in unknown ways. 


Automated Performance Test System (APTS) 

We have begun development of an integrated performance 
measurement and assessment system. It includes hardware (NEC 
PC8201A) and software which has the capability for data storage/ 
retrieval including offline storage of data collected within the 
system. This system is fully portable end we believe somewhat 
rugged, but have not tested to what extent. It is a self- 
contained, battery operated (dry cell), notebook sized, 64K 
internal RAM, with a self-contained display resolution of 240 x 
64 elements. The measurement response time is 4.0 milliseconds. 
The bundled performance measurement software interfaces with a 
desk top (or hand-held) printer. The software is being designed 
by M.G. Smith, who in addition to serving as chief trouble- 
shooter for NTEC's Human Factors' computer laboratory, is also 
the Essex' Orlando Head of Systems. Thus far, we have 15 tests/ 
tasks games/questionnaires on the microprocessor. All are auto- 
matically scored and registered. A cartridge can be inserted to 
off-load a subject's scores, thus leaving the testing device in 
the field for continued use. 

The tests include: Grammatical Reasoning, Code Substitution, 
a Video Game, Speed of Tapping (3 forms). Arithmetic, Tower of 
Hanoi, Fitts' Histoforms, Dynamic Visual Acuity, Motion Sickness 
History Questionnaire, Mood Adjective Checklist, Motion Sickness 
Symptomatology, Pattern Comparison Manikin Test, Sternberg's 
Test, and Simple and Choice Reaction Time. 

Thus far we have used forms of the battery before and after 
simulator hops at three sites and the tests appear to be at 
least as sensitive as postural equilibrium and subject reports. 
We are continuing our development under NASA sponsorship, and 
have some efforts under way comparing paper and pencil with 
microprocessor presentations of stimuli. The Navy has begun to 
use it at Warminster (NADC) before and after spin-test work with 
F/A 18 simulations on the cencrifuge. The USAF (Aeromedical 
Research Ltboratory at Brooks AFB) has begun a program to study 
performances using these devices at simulated altitudes, and 
Louisiana Stote University Medical Center is using a version to 
study motion sickness drug effects. We are in the process of 
adding some memory, information processing, spatial perception, 
and visual function tests. 


(NOTE: Retarding the learning process may also be a sensi- 
tive indicant in its own right, but such a question is different 
from the question of whether performance per se is disrupted). 
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